Paraphrase Generation as Monolingual Translation: Data and Evaluation
نویسندگان
چکیده
In this paper we investigate the automatic generation and evaluation of sentential paraphrases. We describe a method for generating sentential paraphrases by using a large aligned monolingual corpus of news headlines acquired automatically from Google News and a standard Phrase-Based Machine Translation (PBMT) framework. The output of this system is compared to a word substitution baseline. Human judges prefer the PBMT paraphrasing system over the word substitution system. We demonstrate that BLEU correlates well with human judgements provided that the generated paraphrased sentence is sufficiently different from the source sentence.
منابع مشابه
Extract Domain-specific Paraphrase from Monolingual Corpus for Automatic Evaluation of Machine Translation
Paraphrase can help match synonyms or match phrases with the same or similar meaning, thus it plays an important role in automatic evaluation of machine translation. The traditional approaches extract paraphrase in general domain from bilingual corpus. Because the WMT16 metrics task consists of three subtasks, namely news domain, medical domain, and IT domain, we propose to extract domainspecif...
متن کاملCreating and using large monolingual parallel corpora for sentential paraphrase generation
In this paper we investigate the automatic generation of paraphrases by using machine translation techniques. Three contributions we make are the construction of a large paraphrase corpus for English and Dutch, a re-ranking heuristic to use machine translation for paraphrase generation and a proper evaluation methodology. A large parallel corpus is constructed by aligning clustered headlines th...
متن کاملMonolingual Machine Translation for Paraphrase Generation
We apply statistical machine translation (SMT) tools to generate novel paraphrases of input sentences in the same language. The system is trained on large volumes of sentence pairs automatically extracted from clustered news articles available on the World Wide Web. Alignment Error Rate (AER) is measured to gauge the quality of the resulting corpus. A monotone phrasal decoder generates contextu...
متن کاملNeural Paraphrase Generation using Transfer Learning
Progress in statistical paraphrase generation has been hindered for a long time by the lack of large monolingual parallel corpora. In this paper, we adapt the neural machine translation approach to paraphrase generation and perform transfer learning from the closely related task of entailment generation. We evaluate the model on the Microsoft Research Paraphrase (MSRP) corpus and show that the ...
متن کاملUsing Monolingual Human Computation to Improve Language Translation via Targeted Paraphrase
We introduce a new approach to the problem of obtaining cost-effective, reasonable quality translation, by exploiting simple and inexpensive human computations by monolingual speakers. The key insight behind the process is that it is possible to to spot likely translation errors with only monolingual knowledge of the target language, and it is possible to generate new ways to say the same thing...
متن کامل